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1 item-based top-A/ recommendation algorithms 
Mukund Deshpande, George Karypis 

January 2004 ACM Transactions on Information Systems (TOIS), volume 22 issue 1 
Full text available: ^pdft24Q...61.KBJ Additional Information: Ml citation, .a.bstract, references, jndexterrns 

The explosive growth of the world-wide-web and the emergence of e-commerce has led to 
the development of recommender systems— a personalized information filtering technology 
used to identify a set of items that will be of interest to a certain user. User-based 
collaborative filtering is the most successful technology for building recommender systems 
to date and is extensively used in many commercial recommender systems. Unfortunately, 
the computational complexity of these methods grows I ... 



Keywords: e-commerce, predicting user behavior, world wide web 



Probabjl.^ Q 
Dmitry Pavlov, Padhraic Smyth 

August 2001 Proceedings of the seventh ACM SIGKDD international conference on 

Knowledge discovery and data mining 

_ ma j, u . m jr/^cn 00 L*Di Additional Information: full citation , abstract , references , citings, index 

Full text available: "mpdft ^58.33 KBj : * 

terms 

We investigate the application of Bayesian networks, Markov random fields, and mixture 
models to the problem of query answering for transaction data sets. We formulate two 
versions of the querying problem: the query selectivity estimation (i.e., finding exact counts 
for tuples in a data set) and the query generalization problem (i.e., computing the 
probability that a tuple will occur in new data). We show that frequent itemsets are useful 
for reducing the original data to a compressed representa ... 

Special issue on the fusion of domain knowiedge with data for decision support; Fusion B 
of domain knowledge with data for structural learning in object oriented domains 
Helge Langseth, Thomas D. Nielsen 

December 2003 The Journal of Machine Learning Research, volume 4 

Full text available: ^pdf(.22ZJ.8.KBj Additional Information: MLejMion, abstract. Index tenm 

When constructing a Bayesian network, it can be advantageous to employ structural 
learning algorithms to combine knowledge captured in databases with prior information 
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provided by domain experts. Unfortunately, conventional learning algorithms do not easily 
incorporate prior information, if this information is too vague to be encoded as properties 
that are local to families of variables. For instance, conventional algorithms do not exploit 
prior information about repetitive structures, which are ... 

4 Posters: Combining speech and haptics for intuitive and efficient navigation through Q 
Thomas Kaster, Michael Pfeiffer, Christian Bauckhage 

November 2003 Proceedings of the 5th international conference on Multimodal 
interfaces 

Full text available: ^ ;xif(239.65 KB) Additional Information: full citation , abstract, references , index terms 

Given the size of todays professional image databases, the stan-dard approach to object- or 
theme-related image retrieval is to in-teractively navigate through the content. But as most 
users of such databases are designers or artists who do not have a technical back-ground, 
navigation interfaces must be intuitive to use and easy to learn. This paper reports on 
efforts towards this goal. We present a system for intuitive image retrieval that features 
different moda-lities for interaction. Apart f ... 

Keywords: content-based image retrieval, fusion of haptics, multimodal interface 
evaluation, speech, vision processing 

5 Multi Relational Data Mining (MRDM): Probabilistic logic learning Q 
Luc De Raedt, Kristian Kersting 

July 2003 ACM SIGKDD Explorations Newsletter, volume 5 issue l 

Full text available: |||pdfQJ8_MBl Additional Information: MLoMion, abstract, references, citings 

The past few years have witnessed an significant interest in probabilistic logic learning, i.e. 
in research lying at the intersection of probabilistic reasoning, logical representations, and 
machine learning. A rich variety of different formalisms and learning techniques have been 
developed. This paper provides an introductory survey and overview of the state-of-the-art 
in probabilistic logic learning through the identification of a number of important 
probabilistic, logical and learning concept ... 

Keywords: data mining, inductive logic programming, machine learning, multi-relational 
data mining, probabilistic reasoning, uncertainty 



Video retrieval: Semi-supervised learning for facial expression recognition 
Ira Cohen, Nicu Sebe, Fabio G. Cozman, Thomas S. Huang 

November 2003 Proceedings of the 5th ACM SIGMM international workshop on 
Multimedia information retrieval 

Full text available: ^.pdf(34i,ZQ. KB) Additional Information: fuJLcitation, abstract, references, indexjenris 

Automatic classification by machines is one of the basic tasks required in any pattern 
recognition and human computer interaction applications. In this paper, we discuss training 
probabilistic classifiers with labeled and unlabeled data. We provide an analysis which 
shows under what conditions unlabeled data can be used in learning to improve 
classification performance. We discuss the implications of this analysis to a specific type of 
probabilistic classifiers, Bayesian networks, and propose a ... 

Keywords: Bayesian networks, facial expression recognition, semi-supervised learning 
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Yoseph Barash, Gal Elidan, Nir Friedman, Tommy Kaplan 

April 2003 Proceedings of the seventh annual international conference on 

Computational molecular biology 

r „, , •■ u. A/ l/n , Additional Information: full citation, abstract, references , citings, index 

Full text available: j m pdf(41 1 .94 KB> ' ' ' 

terms 

The availability of whole genome sequences and high-throughput genomic assays opens the 
door for in silico analysis of transcription regulation. This includes methods for discovering 
and characterizing the binding sites of DNA-binding proteins, such as transcription factors. 
A common representation of transcription factor binding sites is a position specific score 
matrix (PSSM). This representation makes the strong assumption that binding site positions 
are independent of each othe ... 

Keywords: DNA sequence motifs, bayesian networks, factors binding sites, transcription 



8 Multi Relational Data Mining (IVIRDM): Biological applications of multi-relational data Q 
.[Djnjng 

David Page, Mark Craven 

July 2003 ACM SIGKDD Explorations Newsletter, volume 5 issue l 

Full text available: ^ jpdf(1.l2 MB) Additional Information: full citation , abstract, references , citings 

Biological databases contain a wide variety of data types, often with rich relational 
structure. Consequently multi-relational data mining techniques frequently are applied to 
biological data. This paper presents several applications of multi-relational data mining to 
biological data, taking care to cover a broad range of multi-relational data mining 
techniques. 



9 Syjvey„Mic|es Q 
Soumen Chakrabarti 

January 2000 ACM SIGKDD Explorations Newsletter, volume l issue 2 

Full text available: ^pdf(1.19 MB) Additional Information: full citation , abstract, references , citings 

With over 800 million pages covering most areas of human endeavor, the World-wide Web 
is a fertile ground for data mining research to make a difference to the effectiveness of 
information search. Today, Web surfers access the Web through two dominant interfaces: 
clicking on hyperlinks and searching via keyword queries. This process is often tentative 
and unsatisfactory. Better support is needed for expressing one's information need and 
dealing with a search result in more structured ways than av ... 



10 Context^ Q 
Yoseph Barash, Nir Friedman 

April 2001 Proceedings of the fifth annual international conference on Computational 
biology 

Additional Information: full ci Lai ion , abstract , references , citings , index 



Full text available: "ilpdfi'233.32 KB) 

^ terms 

The recent growth in genomic data and measurement of genome-wide expression patterns 
allows to examine gene regulation by transcription factors using computational tools. In this 
work, we present a class of mathematical models that help in understanding the 
connections between transcription factors and functional classes of genes based on genetic 
and genomic data. These models represent the joint distribution of transcription factor 
binding sites and of expression levels of a gene in a single ... 

11 Towards automated synthesis of data mining programs Q 
Wray Buntine, Bernd Fischer, Thomas Pressburger 

August 1999 Proceedings of the fifth ACM SIGKDD international conference on 
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Knowledge discovery and data mining 

Full text available: ^p„df(.637.67 KBj Additional Information: MLviMiQiL references, jndex terms 



12 ProtpcoisL^ Q 
Chun Zeng, Chun-Xiao Xing, Li-Zhu Zhou 

May 2003 Proceedings of the twelfth international conference on World Wide Web 

Full text available: ^ pdff 306.48 KB) Additional Information: full citation, abstract , references , index terms 

Collaborative filtering has been very successful in both research and applications such as 
information filtering and E-commerce. The k-Nearest Neighbor (KNN) method is a popular 
way for its realization. Its key technique is to find k nearest neighbors for a given user to 
predict his interests. However, this method suffers from two fundamental problems: 
sparsity and scalability. In this paper, we present our solutions for these two problems. We 
adopt two techniques: a matrix conversion method for ... 

Keywords: collaborative filtering, instance selection, similarity measure 

13 From promoter sequence to expression: a probabilistic framework Q 
Eran Segal, Yoseph Barash, Itamar Simon, Nir Friedman, Daphne Koller 

April 2002 Proceedings of the sixth annual international conference on Computational 
biology 

Full text available: ^.pdfi3,22.MBj. Additional Information: fuJLcjtatjon, abstract, citincjs, jndexjerms 

We present a probabilistic framework that models the process by which transcriptional 
binding explains the mRNA expression of different genes. Our joint probabilistic model 
unifies the two key components of this process: the prediction of gene regulation events 
from sequence motifs in the gene's promoter region, and the prediction of mRNA expression 
from combinations of gene regulation events in different settings. Our approach has several 
advantages. By learning promoter sequence motifs that ar ... 

14 A discriminative model for identifying spatial cis-regulatory modules Q 
Eran Segal, Roded Sharan 

March 2004 Proceedings of the eighth annual international conference on 
Computational molecular biology 

Full text available: ^ pdf(280.23 KB) Additional Information: full citation , abstract, references, index terms 

Transcriptional regulation is mediated by the coordinated binding of transcription factors to 
the upstream region of genes. In higher eukaryotes, the binding sites of cooperating 
transcription factors are organized into short sequence units, called cis-regulatory modules. 
In this paper we propose a method for identifying modules of transcription factor binding 
sites in a set of co-regulated genes, using only the raw sequence data as input. Our method 
is based on a novel probabilistic model that ... 

Keywords: cis-regulatory module, probabilistic model, transcriptional regulation 



15 Motion texture: a two-level statistical mode! for character motion synthesis 
Yan Li, Tianshu Wang, Heung-Yeung Shum 

July 2002 ACM Transactions on Graphics (TOG) , Proceedings of the 29th annual 

conference on Computer graphics and interactive techniques, volume 21 issue 3 

_ mi ± ., Bi <Mti r\& ma* Additional Information: full citation, abstract, references, citings, index 

Full text available: fl) MR) : * 

terms 

In this paper, we describe a novel technique, called motion texture, for synthesizing 
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complex human-figure motion (e.g., dancing) that is statistically similar to the original 
motion captured data. We define motion texture as a set of motion textons and their 
distribution, which characterize the stochastic and dynamic nature of the captured motion. 
Specifically, a motion texton is modeled by a linear dynamic system (LDS) while the texton 
distribution is represented by a transition matrix indicat ... 

Keywords: linear dynamic systems, motion editing, motion synthesis, motion texture, 
texture synthesis 



16 Hypertext data mining (tutorial AM-1) Q 
Soumen Chakrabarti 

August 2000 Tutorial notes of the sixth ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Full text available: ^.pdfil.,08. MB) Additional Information: MLcjtaiion, Index terms 



17 Web. search .1 Q 
Pavel Ca'lado, Altigran S. da Silva, Rodrigo C. Vieira, Alberto H. F. Laender, Berthier A. Ribeiro- 
Neto 

November 2002 Proceedings of the eleventh international conference on Information 

and knowledge management 

r- „ A ^ Ll a .««v*« -v* i^o* Additional Information: full citation, abstract, references, citings, index 
Full text available: t||pdf(204.22 KB) ~ 

terms 

On-line information services have become widespread in the Web nowadays. However, Web 
users are non-specialized and have a great variety of interests. Thus, interfaces for Web 
databases must be simple and uniform. In this paper we present an approach, based on 
Bayesian networks, for querying Web databases using keywords only. According to this 
approach, the user inputs a query through a simple search-box interface. From the input 
query, one or more plausible structured queries are derived and su ... 

Keywords: query structuring, structured queries, web databases 



18 Jndu&ryJ^ Q 
network learning 

Peter Antal, Patrick Glenisson, Geert Fannes 

July 2002 Proceedings of the eighth ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Full text available: |||pdf(±J OJyBi Additional Information; MLcitatjon, abstract, references, iDdex.terQis 

Thanks to its increasing availability, electronic literature can now be a major source of 
information when developing complex statistical models where data is scarce or contains 
much noise. This raises the question of how to integrate information from domain literature 
with statistical data. Because quantifying similarities or dependencies between variables is a 
basic building block in knowledge discovery, we consider here the following question. Which 
vector representations of text and which st ... 

Keywords: Bayesian networks, clustering, data mining, text mining 
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Geoff Hulten, Pedro Domingos 

July 2002 Proceedings of the eighth ACM SIGKDD international conference on 
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Knowledge discovery and data mining 

Full text available: fl hxjf. 653.58 KB 



Additional Information: tuiicitatjon., abstract references, citings, incjex 



terms 

In this paper we propose a scaling-up method that is applicable to essentially any induction 
algorithm based on discrete search. The result of applying the method to an algorithm is 
that its running time becomes independent of the size of the database, while the decisions 
made are essentially identical to those that would be made given infinite data. The method 
works within pre-specified memory limits and, as long as the data is iid, only requires 
accessing it sequentially. It gives anytime resu ... 

Keywords: Bayesian networks, Hoeffding bounds, discrete search, scalable learning 
algorithms, subsampling 



20 Classification. ^ Q 
Rodrigo C. Vieira, Pavel Calado, Altigran S. da Silva, Alberto H. F. Laender, Berthier A. Ribeiro- 
Neto 

July 2002 Proceedings of the second ACM/IEEE-CS joint conference on Digital 
libraries 

Full text available: l f|pdf( 116.95 KB) Additional Information: full citation , abstract , references, jnctex terms 

This paper describes a framework, based on Bayesian belief networks, for querying Web 
databases using keywords only. According to this framework, the user inputs a query 
through a simple search-box. From the input query, one or more plausible structured 
queries are derived and submitted to Web databases. The results are then retrieved and 
presented to the user as ranked answers. To evaluate our framework, an experiment using 
38 example queries was carried out. We found out that 97% of the time, ... 

Keywords: bayesian belief networks, web databases, web query 
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21 KDD-99 conference reports: Profiling your customers using Bayesian networks Q 
Paola Sebastiani, Marco Ramoni, Alexander Crea 
January 2000 ACM SIGKDD Explorations Newsletter, volume l issue 2 

Full text available: ^pdf(.1 : 22.[yl.BJ Additional Information: Ml citation, abstract. 



This report describes a complete Knowledge Discovery session using Bayesware Discoverer, 
a program for the induction of Bayesian networks from incomplete data. We build two 
causal models to help an American Charitable Organization understand the characteristics 
of respondents to direct mail fund raising campaigns. The first model is a Bayesian network 
induced from the database of 96,376 Lapsed donors to the June '97 renewal mailing. The 
network describes the dependency of the probability of resp ... 

Keywords: Bayesian networks, customer profiling, missing data 



22 Research sessions: text and DB: When one sample is not enough: improving text Q 

.database.seiecfe 

Panagiotis G. Ipeirotis, Luis Gravano 

June 2004 Proceedings of the 2004 ACM SIGMOD international conference on 
Management of data 

Full text available: |j |pdff391.26 KB) Additional Information: full citation, abstract , references 

Database selection is an important step when searching over large numbers of distributed 
text databases. The database selection task relies on statistical summaries of the database 
contents, which are not typically exported by databases. Previous research has developed 
algorithms for constructing an approximate content summary of a text database from a 
small document sample extracted via querying. Unfortunately, Zipf s law practically 
guarantees that content summaries built this way for any rela ... 

23 Automatical^ Q 
structur^^ 

Marcos Andre Gongalves, Edward A. Fox, Aaron Krowne, Pavel Calado, Alberto H. F. Laender, 
Altigran S. da Silva, Berthier Ribeiro-Neto 

June 2004 Proceedings of the 2004 joint ACM/IEEE conference on Digital libraries 

Full text available: ^pdf(295AQ.KBj Additional Information: MlcMion, abstract, references, index terms 
Structured or fielded metadata is the basis for many digital library services, including 
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searching and browsing. Yet, little is known about the impact of using structure on the 
effectiveness of such services. In this paper, we investigate a key research question: do 
structured queries improve effectiveness in DL searching? To answer this question, we 
empirically compared the use of unstructured queries to the use of structured queries. We 
then tested the capability of a simple Bayesian network s ... 

Keywords: bayesian networks, digital libraries, structured queries 

24 S.&eciaJiss^ Q 
Combining knowledge from different sources in causal probabilistic models 

Marek J. Druzdzel, Francisco J. Dfez 

December 2003 The Journal of Machine Learning Research, Volume 4 

Full text available: ^pdfM4a32.KBj Additional Information: fuj] citation, abstract, index.terms 

Building probabilistic and decision-theoretic models requires a considerable knowledge 
engineering effort in which the most daunting task is obtaining the numerical parameters. 
Authors of Bayesian networks usually combine various sources of information, such as 
textbooks, statistical reports, databases, and expert judgement. In this paper, we 
demonstrate the risks of such a combination, even when this knowledge encompasses such 
seemingly population-independent characteristics as sensitivity and ... 

25 NSF workshop on industrial/academic cooperation in database systems Q 
Mike Carey, Len Seligman 

March 1999 ACM SIGMOD Record, Volume 28 issue l 

Full text available: W.pdfii..9§. MBj. Additional Information: fejl.citatj_o_n. t index jerms 



26 Explore^ 

Wynne Hsu, Mong Li Lee, Bing Liu, Tok Wang Ling 

August 2000 Proceedings of the sixth ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Full text available: 1i bdff136.53 KB) Additional Information: Hill citation, references , citings , index terms 



27 DataM§e„^ | 
neighbors + precision 

Sid-Ahmed Berrani, Laurent Amsaleg, Patrick Gros 

November 2003 Proceedings of the twelfth international conference on Information and 
knowledge management 

Additional Information: MLcfetjon, abstract, references, citings, index 



Full text available: m pdf(154,57 KB) 

^ "" terms 

It is known that all multi-dimensional index structures fail to accelerate content-based 
similarity searches when the feature vectors describing images are high-dimensional. It is 
possible to circumvent this problem by relying on approximate search-schemes trading-off 
result quality for reduced query execution time. Most approximate schemes, however, 
provide none or only complex control on the precision of the searches, especially when 
retrieving the k nearest neighbors (NNs) of query poi ... 

Keywords: approximate nearest-neighbor searches, multimedia databases, similarity 
searches 
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28 Sejectjyity. eMim Q 
Use Getoor, Benjamin Taskar, Daphne Koller 

May 2001 ACM SIGMOD Record , Proceedings of the 2001 ACM SIGMOD international 

conference on Management of data, volume 30 issue i 

r— 1*4. j, ,, dp ^/ror-^L-o* Additional Information: full citation , abstract , references , citings, index 

Full text available: mpdfi 525.74 KB) ; ^' 

^ *" terms 

Estimating the result size of complex queries that involve selection on multiple attributes 
and the join of several relations is a difficult but fundamental task in database query 
processing. It arises in cost-based query optimization, query profiling, and approximate 
query answering. In this paper, we show how probabilistic graphical models can be 
effectively used for this task as an accurate and compact approximation of the joint 
frequency distribution of multiple attributes across multiple ... 

29 Reports from KDD-2001: KDD Cup 2001 report Q 
Jie Cheng, Christos Hatzis, Hisashi Hayashi, Mark-A. Krogel, Shinichi Morishita, David Page, 

Jun Sese 

January 2002 ACM SIGKDD Explorations Newsletter, volume 3 issue 2 

Full text available: "|| |pdf{1.96 MB) Additional Information: full citation , abstract, references , citings 

This paper presents results and lessons from KDD Cup 2001. KDD Cup 2001 focused on 
mining biological databases. It involved three cutting-edge tasks related to drug design and 
genomics. 

Keywords: Competition, biology, drug design, genomics 

30 Evoiving data mining into soiutions for insights: Scaiing mining aigorithrn s to large Q 
databases 

Paul Bradley, Johannes Gehrke, Raghu Ramakrishnan, Ramakrishnan Srikant 
August 2002 Communications of the ACM, volume 45 issue 8 

Full text available: f B pdfl 1 15.66 KB) AJJ . A . llf „ U4 t , . . , 

m \ x , '1 ' , * Additional Information: ?u c;tation , abstract, references , index terms 
|M"btoJ(2M4 KB} 

Which insights about data structure make it possible to analyze the very large databases 
collected by Internet, business, scientific, and government applications? 

31 SpecMjssMe.on Q 
Lisa Getoor, Nir Friedman, Daphne Koller, Benjamin Taskar 

March 2003 The Journal of Machine Learning Research, Volume 3 

Full text available: ^ f)df(479.67 KB) Additional Information: full citation , abstract, index terms 

Most real-world data is heterogeneous and richly interconnected. Examples include the 
Web, hypertext, bibliometric data and social networks. In contrast, most statistical learning 
methods work with "flat" data representations, forcing us to convert our data into a form 
that loses much of the link structure. The recently introduced framework of probabilistic 
relational models (PRMs) embraces the object-relational nature of structured data by 
capturing probabilistic interactions between att ... 

32 SPARTANiam Q 
Shivnath Babu, Minos Garofalakis, Rajeev Rastogi 

May 2001 ACM SIGMOD Record , Proceedings of the 2001 ACM SIGMOD international 

conference on Management of data, Volume 30 issue 2 

r- .. . ^ -. L , -ra ^ , n ^ Additional Information: full citation, abstract , references , citings , index 

Full text available: f|g pdf 240. 1 9 KB} ' 
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While a variety of lossy compression schemes have been developed for certain forms of 
digital data (e.g., images, audio, video), the area of lossy compression techniques for 
arbitrary data tables has been left relatively unexplored. Nevertheless, such techniques are 
clearly motivated by the ever-increasing data collection rates of modern enterprises and the 
need for effective, guaranteed-quality approximate answers to queries over massive 
relational data sets. In this paper, we propose SPA ... 
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Cheng Yang, Usama Fayyad, Paul S. Bradley 

August 2001 Proceedings of the seventh ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Additional Information: fyJLcitatjon, abstract, references, citings, index 



Full text available: TOpdff 1.11 MBJl 

^ term;; 

We present a generalization of frequent itemsets allowing for the notion of errors in the 
itemset definition. We motivate the problem and present an efficient algorithm that 
identifies error-tolerant frequent clusters of items in transactional data (customer-purchase 
data, web browsing data, text, etc.). The algorithm exploits sparseness of the underlying 
data to find large groups of items that are correlated over database records (rows). The 
notion of transaction coverage allows us to extend th ... 

Keywords: Error-tolerant frequent itemset, clustering, collaborative filtering, high 
dimensions, query selectivity estimation 
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Thomas Eiter, James J. Lu, Thomas Lukasiewicz, V. S. Subrahmanian 
September 2001 ACM Transactions on Database Systems (TODS), Volume 26 issue 3 

Full text available: ^.pdfi663 : 73, KB) Additional Information: MLdtatjon, abstract, references, index terms 

Although there are many applications where an object-oriented data model is a good way of 
representing and querying data, current object database systems are unable to handle 
objects whose attributes are uncertain. In this article, we extend previous work by 
Kornatzky and Shimony to develop an algebra to handle object bases with uncertainty. We 
propose concepts of consistency for such object bases, together with an NP-completeness 
result, and classes of probabilistic object bases for which consi ... 

Keywords: Consistency, object-oriented database, probabilistic object algebra, 
probabilistic object base, probability, query language, query optimization 
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November 1999 Proceedings of the seventh ACM international symposium on Advances 
in geographic information systems 

Full text available: ^pdfQlL.Qi.KBj Additional Information: fuH citation, references, index terms 



Keywords: GIS, agriculture planning, artificial intelligence, decision support system, expert 
system, geoinformatics, geoinformation system, land evaluation, land use planning 
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Full text available: ^p„df(.443,S9„KBj Additional Information: MLcJtatjon, abstract, references, index teirns 

Several important time series data mining problems reduce to the core task of finding 
approximately repeated subsequences in a longer time series. In an earlier work, we 
formalized the idea of approximately repeated subsequences by introducing the notion of 
time series motifs. Two limitations of this work were the poor scalability of the motif 
discovery algorithm, and the inability to discover motifs in the presence of noise. Here we 
address these limitations by introducing a novel algorithm insp ... 

Keywords: data mining, motifs, randomized algorithms, time series 
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Corinne Clinton Ruokangas, Ole J. Mengshoel 

January 2003 Proceedings of the 8th international conference on Intelligent user 
interfaces 

Full text available: ^.p.df(109.MB) Additional Information: .fejJ.cjta.tion, abstract, .references, jndeAterms 

Weather is a complex, dynamic process with tremendous impact on aviation. While pilots 
often have access to large amounts of aviation weather data, they find it difficult and time- 
consuming to identify weather hazards, due to the sheer amount and cryptic formatting of 
the data. To address this challenge, we have developed information filtering concepts based 
on a unified Bayesian network model, integrating text and graphical weather data in the 
context of specific mission, equipment and personal ... 

Keywords: bayesian models, bayesian networks, data filtering, information management, 
intelligent visualization, situation awareness 
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Ihab F. Ilyas, Volker Markl, Peter Haas, Paul Brown, Ashraf Aboulnaga 
June 2004 Proceedings of the 2004 ACM SIGMOD international conference on 
Management of data 

Full text available: *^ pdf(559.35 KB) Additional Information: full citation, abstract , references 

The rich dependency structure found in the columns of real-world relational databases can 
be exploited to great advantage, but can also cause query optimizers—which usually 
assume that columns are statistically independent— to underestimate the selectivities of 
conjunctive predicates by orders of magnitude. We introduce CORDS, an efficient and 
scalable tool for automatic discovery of correlations and soft functional dependencies 
between columns. CORDS searches for column pairs that might have ... 
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August 1999 Proceedings of the fifth ACM SIGKDD international conference on 
Knowledge discovery and data mining 
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Keywords: OLAP, approximate query answering, clustering, data cubes, data mining, 
density estimation 
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Shivnath Babu, Minos Garofalakis, Rajeev Rastogi 
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Full text available: pdf(259.12 KB) Additional Information: full citation, abstract, references , citings 

While a variety of lossy compression schemes have been developed for certain forms of 
digital data (e.g., images, audio, video), the area of lossy compression techniques for 
arbitrary data tables has been left relatively unexplored. Nevertheless, such techniques are 
clearly motivated by the ever-increasing data collection rates of modern enterprises and the 
need for effective, guaranteed-quality approximate answers to queries over massive 
relational data sets. In this paper, we propose SPARTAN ... 
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PAT. NO. Title 

1 6,701,016 M Method of learning deformation models to facilitate pattern matching 

2 6.616,529 M Simulation and synthesis of sports matches 

3 6,556,958 W. Fast clustering with sparse data 

4 6.529,891 IE Automatic determination of the number of clusters by mixtures of bayesian networks 

5 6,496,816 M Collaborative filtering with mixtures of bayesian networks 

6 6,408.290 M Mixtures of bavesian networks with decision graphs 

7 6.345,265 M Clustering with mixtures of bayesian networks 

8 6.336.108 IE Speech recognition with mixtures of bayesian networks 
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